Computational science programs are typically assessed with a series of test cases, ranging from order-of-magnitude estimates ("those numbers don't look right") to comparing with experimental data. I've attempted to divide these test cases into categories, where each category shares similar expectations as to what 'close enough' means and has similar implications into potential causes for discrepancies.
- Order of magnitude estimates from general principles, back of the envelope calculations, or experience with similar problems.
- Analytically solvable cases.
- Small cases that are accurately solvable
For small systems, or simple cases, there are often multiple solution methods, and some of them are very accurate (or exact). The methods can be compared for these small systems. An important feature of this category is any approximations can be controlled and the effects made arbitrarily small. - Results from the same (or similar) algorithm
Comparing the same system with the same algorithm should yield close results, but now there is additional uncertainty in the exact implementation, and the sensitivity to input precision (particularly when comparing results from a journal article) - Results from a different algorithm
This looks similar to the second category (comparing with exact or more precise algorithms), but this is the case where there a multiple methods (with different approximations) for handling larger systems. There are likely to be more approximations (and probably uncontrolled approximations) involved.
(In electronic structure there is Hartree-Fock(HF), several post-HF methods, Density Functional Theory, and Quantum Monte Carlo. And possibly a few others.) Now one has to deal with the all the above possibilities for two programs, rather than just one. - Experimental results
In some ways this is quite similar to the the previous category, except determining "sufficiently" close requires some understanding of the experimental uncertainties and corrections in addition to possible program errors.
The testing process for each case involves running the test case, deciding whether the program result is 'close enough'. If it is, proceed to the next test case. If it is not, work to discover the cause of the discrepancy. Next post I'll look at a list of possible causes.